The DOP Estimation Method Is Biased and Inconsistent

نویسنده

  • Mark Johnson
چکیده

A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent; that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Consistent and Efficient Estimator for Data-Oriented Parsing

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One crucial property of a ‘good’ estimator is that its guess approaches the unknown distribution as the sample sequence grows large. This property is called consistency. This paper concerns estimators f...

متن کامل

Backoff DOP: Parameter Estimation by Backoff

The Data Oriented Parsing (DOP) model currently achieves state-ofthe-art parsing on benchmark corpora. However, existing DOP parameter estimation methods are known to be biased, and ad hoc adjustments are needed in order to reduce the effects of these biases on performance. This paper presents a novel estimation procedure that exploits a unique property of DOP: different derivations can generat...

متن کامل

Structured Parameter Estimation for LFG-DOP using Backoff

Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structur...

متن کامل

Back-off as Parameter Estimation for DOP models

Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a ...

متن کامل

Modeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price

In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2002